AlgoWiki

Sieve of Eratosthenes

The Sieve of Eratosthenes is a classic algorithm for finding all prime numbers up to a given limit nn. Instead of testing each number for primality individually, it works the other way around: it starts by assuming every number is prime, and then repeatedly crosses out the multiples of each prime it finds. Whatever survives is exactly the set of primes. It runs in O(nloglogn)O(n \log \log n) time, which is fast enough to sieve up to 10710^710810^8 comfortably within typical contest limits, and it is the workhorse behind a huge number of number-theoretic tasks in competitive programming.

The sieve is the prototypical example of the more general Sieve technique: computing a value for every integer in a range by letting each prime (or each small factor) contribute to all of its multiples.

Description

We keep a boolean array is_prime[0..n], initially all true except for 00 and 11, which are not prime. We then scan i=2,3,4,i = 2, 3, 4, \dots in increasing order. When we reach an ii that is still marked prime, it genuinely is prime (every smaller factor would already have crossed it out), so we record it and mark all of its multiples as composite.

A key optimization is to start crossing out from i2i^2 rather than 2i2i: every smaller multiple kik \cdot i with k<ik < i has a prime factor smaller than ii and was therefore already eliminated. A direct consequence is that once i2>ni^2 > n there is nothing left to cross out, so the outer loop only needs to run while ini \le \sqrt{n}

const int N = 1000000;
bitset<N + 1> is_prime;
vector<int> primes;

void sieve() {
    is_prime.set();                 // assume everything is prime
    is_prime[0] = is_prime[1] = 0;
    for (int i = 2; i <= N; i++) {
        if (is_prime[i]) {
            primes.push_back(i);
            for (long long j = (long long)i * i; j <= N; j += i)
                is_prime[j] = 0;    // cross out multiples of i
        }
    }
}

Using a bitset (or vector<bool>) keeps the memory footprint to one bit per number, which both saves space and improves cache behaviour. Note the cast to long long when computing i * i: for nn near 2312^{31} the product overflows a 32-bit int

Complexity

The work done is proportional to the number of crossing-out operations, which is

pn, p primenp=npn1p=nloglogn+O(n),\sum_{p \le n,\ p \text{ prime}} \frac{n}{p} = n \sum_{p \le n} \frac{1}{p} = n \log \log n + O(n),

by Mertens' theorem on the sum of reciprocals of primes. Hence the running time is O(nloglogn)O(n \log \log n) — only marginally above linear. The memory usage is O(n)O(n) (one bit per number with a bitset).

Applications

  • Listing or counting primes. The most direct use: enumerate every prime up to nn, or answer "how many primes are x\le x?" with a prefix-sum over the sieve.
  • Fast repeated primality tests. After one O(nloglogn)O(n \log \log n) precomputation, each query "is xx prime?" for xnx \le n is answered in O(1)O(1)
  • Fast factorization. By storing the smallest prime factor of every number (see below), any xnx \le n can be factored in O(logx)O(\log x) time — far faster than O(x)O(\sqrt{x}) trial division when many numbers must be factored.
  • Computing multiplicative functions. Functions such as Euler's totientφ\varphi, the Möbius functionμ\mu, the divisor count τ\tau and divisor sum σ\sigma can all be tabulated for every integer up to nn during a single sieve (see Sieving multiplicative functions). These tables are the backbone of Möbius inversion and inclusion–exclusion counting arguments.

Variants

Smallest prime factor and fast factorization

A small change turns the sieve into a tool for factorizing numbers quickly. Instead of a boolean flag, store for each number the smallest prime that divides it. The linear sieve (also called the Euler sieve) does this in genuinely linear O(n)O(n) time by guaranteeing that every composite is crossed out exactly once — by its smallest prime factor.

const int N = 1000000;
int spf[N + 1];          // spf[x] = smallest prime factor of x
vector<int> primes;

void linear_sieve() {
    for (int i = 2; i <= N; i++) {
        if (spf[i] == 0) {              // i is prime
            spf[i] = i;
            primes.push_back(i);
        }
        for (int p : primes) {
            if (p > spf[i] || (long long)i * p > N) break;
            spf[i * p] = p;
        }
    }
}

The crucial line is the loop break: we only mark ipi \cdot p for primes pspf(i)p \le \mathrm{spf}(i), so each composite m=ipm = i \cdot p is touched exactly once, namely when i=m/pi = m / p and pp is the smallest prime factor of mm. With spf in hand, factorization is a short loop:

vector<int> factorize(int x) {          // returns prime factors with multiplicity
    vector<int> f;
    while (x > 1) { f.push_back(spf[x]); x /= spf[x]; }
    return f;
}

In practice the plain bitset sieve is often faster than the linear sieve for merely listing primes, because it touches memory more cache-friendly and avoids the inner vector iteration; the linear sieve earns its keep when you also need spf or a multiplicative function table.

Segmented sieve

What if you need the primes in a high range such as [109,109+106][10^9, 10^9 + 10^6]? An array of size 10910^9 is far too large, but the range itself is small. The segmented sieve exploits the fact that every composite H\le H has a prime factor H\le \sqrt{H}. So we first sieve the primes up to H\sqrt{H} with the ordinary sieve, then use them to cross out composites in the window [L,H][L, H], using an array indexed by offset xLx - L

// requires `primes` to contain all primes up to floor(sqrt(hi))
vector<char> segmented_sieve(long long lo, long long hi) {   // primes in [lo, hi]
    vector<char> is_prime(hi - lo + 1, 1);
    if (lo == 1) is_prime[0] = 0;
    for (int p : primes) {
        if ((long long)p * p > hi) break;
        long long start = max((long long)p * p, (lo + p - 1) / p * p);
        for (long long j = start; j <= hi; j += p)
            is_prime[j - lo] = 0;
    }
    return is_prime;
}

This runs in O((HL)loglogH+H)O\big((H - L)\log\log H + \sqrt{H}\big) time using only O(H+(HL))O\big(\sqrt{H} + (H - L)\big) memory.

Sieving multiplicative functions

A function ff is multiplicative if f(ab)=f(a)f(b)f(ab) = f(a)f(b) whenever gcd(a,b)=1\gcd(a,b)=1. The linear sieve can compute such a function for every integer up to nn in O(n)O(n) time, because when it marks a composite ipi \cdot p it knows whether pp already divides ii, which is exactly the information needed to extend ff

For example, Euler's totientφ\varphi and the Möbius functionμ\mu

int phi[N + 1], mu[N + 1];
bool composite[N + 1];
vector<int> primes;

void multiplicative_sieve() {
    phi[1] = mu[1] = 1;
    for (int i = 2; i <= N; i++) {
        if (!composite[i]) { primes.push_back(i); phi[i] = i - 1; mu[i] = -1; }
        for (int p : primes) {
            if ((long long)i * p > N) break;
            composite[i * p] = true;
            if (i % p == 0) {              // p divides i: p is a repeated factor
                phi[i * p] = phi[i] * p;
                mu[i * p] = 0;
                break;
            } else {                       // p is a new prime factor
                phi[i * p] = phi[i] * (p - 1);
                mu[i * p] = -mu[i];
            }
        }
    }
}

The same skeleton tabulates the number of divisors τ\tau, the sum of divisors σ\sigma, and many other functions — only the two update lines change.

Problems

The list below is grouped by which idea the problem exercises. Several entries include a short solution sketch behind a spoiler; it focuses only on how the sieve is used, not on a full editorial.

Basic sieve

Fast factorization with the smallest prime factor

Solution sketch — Counting Divisors

Precompute spf once with a linear sieve up to 10610^6. For each query value xx, factorize it in O(logx)O(\log x) using spf, collect the exponents e1,,eke_1, \dots, e_k of its distinct primes, and output τ(x)=(ei+1)\tau(x) = \prod (e_i + 1). Total time O(nmax+qlogx)O(n_{\max} + q \log x), far better than O(qx)O(q\sqrt{x}) trial division when there are many queries.

Solution sketch — Sherlock and his girlfriend

Two numbers where one is a prime multiple of the other must get different colours, so the answer is 11 only when there is a single item, and otherwise 22. A valid 22-colouring is to colour each item by the parity of the number of prime factors of its value (with multiplicity); adjacent values in the constraint differ by exactly one prime factor and so always disagree. The spf sieve gives both the factor count and the primality test cheaply.

Counting / prefix tricks over the sieve

Solution sketch — Bear and Prime Numbers

For every prime p107p \le 10^7, count how many array elements are divisible by pp; this is itself a sieve-style loop (for each prime, walk its multiples and add up their frequencies). Build a prefix sum over primes indexed by value. A query [l,r][l, r] is then answered in O(1)O(1) as the difference of two prefix sums.

Multiplicative-function sieves

Solution sketch — Counting Coprime Pairs

Let cdc_d be how many array elements are divisible by dd (a sieve-style count). By Möbius inversion, the number of pairs (i,j)(i,j) with gcd(ai,aj)=1\gcd(a_i,a_j)=1 is

d1μ(d)(cd2),\sum_{d \ge 1} \mu(d)\binom{c_d}{2},

where μ\mu comes from the multiplicative sieve. The whole solution is O(AlogA)O(A \log A) where AA is the maximum value.

Segmented sieve

See also